Introduction

Search words

Two words that I used to search are “lake boat” Since I love water and lakes, I wanted to get lake pictures. So I searched “silent lake” before “lake boat”, but there didn’t return over 200 pictures. I could find a nice picture with a boat on a lake, then I changed my words to “lake boat”

screenshot of the first few rows of royalty-free photos
screenshot of the first few rows of royalty-free photos

Observed features of the photos

  1. Tags: common tags are “lake”, “boat” and “sunset”

  2. page orientation: Mostly landscape

  3. Colors: The main colors is blue and Orange is the next common color.

knitr::kable(photo_data%>% select(pageURL))
pageURL
https://pixabay.com/photos/sailboats-lake-constance-8337698/
https://pixabay.com/photos/boat-river-forest-landscape-woods-6579441/
https://pixabay.com/photos/boat-sea-yacht-lake-water-ocean-8123031/
https://pixabay.com/photos/lake-balaton-lake-water-port-7343151/
https://pixabay.com/photos/boat-lake-trees-forest-fall-8332114/
https://pixabay.com/photos/boat-reflection-sea-sky-8275962/
https://pixabay.com/photos/mountains-lake-cruise-alps-8419249/
https://pixabay.com/photos/sea-sunset-boat-sunrise-boat-boat-4747601/
https://pixabay.com/photos/lugu-lake-boat-lake-blue-horizon-8679121/
https://pixabay.com/photos/boat-lake-waves-sea-sail-boat-8244952/
https://pixabay.com/photos/man-boat-wooden-boat-lake-trees-8265950/
https://pixabay.com/photos/lake-lake-balaton-water-nature-2822394/
https://pixabay.com/photos/boat-sunset-dawn-nature-waterfront-3085643/
https://pixabay.com/photos/boat-lake-sleet-old-wooden-7639560/
https://pixabay.com/photos/sunset-lake-fisherman-boat-dusk-8554635/
https://pixabay.com/photos/boat-the-needle-man-fisherman-8483163/
https://pixabay.com/photos/seagull-gull-sunset-marina-boat-7401497/
https://pixabay.com/photos/konigssee-lake-germany-6635951/
https://pixabay.com/photos/lake-mountains-nature-travel-6786472/
https://pixabay.com/photos/boat-at-night-lake-balaton-water-1607395/
https://pixabay.com/photos/sailing-boat-sunset-summer-nature-3758189/
https://pixabay.com/photos/copenhagen-river-denmark-8513129/
https://pixabay.com/photos/boat-water-rope-lake-wooden-boat-8219886/
https://pixabay.com/photos/boat-sea-ocean-vessel-adventure-8515980/
https://pixabay.com/photos/boat-wooden-boat-water-landscape-4320990/
https://pixabay.com/photos/woman-model-pose-style-fashion-6569323/
https://pixabay.com/photos/boat-houses-lake-boat-storage-6839649/
https://pixabay.com/photos/rowing-boat-lake-%C5%82%C3%B3d%C5%BA-wooden-boat-7412562/
https://pixabay.com/photos/mountains-boat-lake-reflections-5527253/
https://pixabay.com/photos/boat-lake-dock-coast-riverbank-8328744/
https://pixabay.com/photos/boat-house-reflections-lake-5103798/
https://pixabay.com/photos/landscape-rowing-boat-sunset-water-4417906/
https://pixabay.com/photos/boat-part-lake-balaton-ship-nature-3473572/
https://pixabay.com/photos/boat-lake-water-water-lily-boat-1699294/
https://pixabay.com/photos/rowing-boat-lake-%C5%82%C3%B3d%C5%BA-wooden-boat-7412542/
https://pixabay.com/photos/boat-sailboat-lake-lake-como-7767626/
https://pixabay.com/photos/lake-boat-dusk-sunset-landscape-6815353/
https://pixabay.com/photos/rowing-boat-lake-boat-birch-trees-7321802/
https://pixabay.com/photos/boat-lake-kastoria-3384017/
https://pixabay.com/photos/boat-lake-landscape-nature-reeds-2640630/
https://pixabay.com/photos/wooden-boat-lake-reflection-summer-4395843/
https://pixabay.com/photos/lake-boat-morning-calm-serenity-7725571/
https://pixabay.com/photos/sea-waves-sailboat-boat-catamaran-8581529/
https://pixabay.com/photos/mountains-nature-lake-travel-6963913/
https://pixabay.com/photos/boat-bird-fog-water-landscape-4511720/
https://pixabay.com/photos/boat-lake-water-landscape-nature-4836647/
https://pixabay.com/photos/vietnam-lake-sunset-mountain-boat-7427067/
https://pixabay.com/photos/lake-village-church-hallstatt-8357182/
https://pixabay.com/photos/lake-nature-boat-travel-6749203/
https://pixabay.com/photos/reflections-boat-lake-3836448/
These are 50 pictures that I got when I searched “lake boat”
These are 50 pictures that I got when I searched “lake boat”

Key features of selected photos

meanLikesProportion <- photo_data$likesProportion %>% mean(na.rm=TRUE)
total_photos <- nrow(photo_data)
meanViews <- photo_data$views %>% mean(na.rm=TRUE)
boat_photos <- sum(str_detect(photo_data$tags, "boat"))
  1. The mean proportion of likes for the selected photos is 2.2%.

  2. A total of 50 photos were selected for analysis.

  3. Among the selected photos, 37 have “boat” tags.

  4. Then mean number of views for the selected photos is 6043.

Creativity

#seperate rows with "," to get a single tag from tags 
tags_new <- photo_data%>%
  separate_rows(tags, sep = ", ")
tags_new
## # A tibble: 150 × 7
##    previewURL      pageURL tags  likesProportion pageOrientation sizeLevel views
##    <chr>           <chr>   <chr>           <dbl> <chr>           <chr>     <dbl>
##  1 https://cdn.pi… https:… sail…           1.86  landscape       mid        7631
##  2 https://cdn.pi… https:… lake…           1.86  landscape       mid        7631
##  3 https://cdn.pi… https:… wate…           1.86  landscape       mid        7631
##  4 https://cdn.pi… https:… boat            0.903 landscape       large     28227
##  5 https://cdn.pi… https:… river           0.903 landscape       large     28227
##  6 https://cdn.pi… https:… fore…           0.903 landscape       large     28227
##  7 https://cdn.pi… https:… boat            1.00  landscape       large      4089
##  8 https://cdn.pi… https:… sea             1.00  landscape       large      4089
##  9 https://cdn.pi… https:… yacht           1.00  landscape       large      4089
## 10 https://cdn.pi… https:… lake…           2.15  landscape       large      1810
## # ℹ 140 more rows
# count number of each tags
tags_count <- tags_new%>%
  group_by(tags)%>%
  summarise(n())
#rename "n()" to "freq" 
tags_count<- tags_count %>%
  rename(freq = 2)
# Filter the top 5 tags from tags_count
top_tags_counts <- tags_count %>%
  filter(freq %in% head(sort(freq, decreasing = TRUE), 5))

  
# Plot the bar plot
ggplot(top_tags_counts, aes(x = tags, y = freq)) +
  geom_bar(stat = "identity", fill = "skyblue") +
  labs(x = "Tags", y = "Frequency", title = "Top 5 Most Common Tags") 

I demonstrated creativity by creating a plot representing top 5 most common tags. To get a separated tags , the contents from the Lab3A and Lab3B were useful like separate_rows() function to unnest the variable tags with “,” , group_by(), rename(), and filter().

Learning reflection

In Module 3, I gained extensive knowledge in manipulating data from CSV and JSON formats through completing lab tasks and the project. It was interesting to discover how I could craft a new data frame tailored to my exploration needs by manipulating and summarizing data. Additionally, I learned to create calculated variables derived from dataset exploration. Utilizing APIs in the project further deepened my understanding of their functionality and application. This experience not only enhanced my data manipulation skills but also broadened my comprehension of APIs, underscoring their importance in data analysis. Lastly, reusing some functions from previous labs were a good way of revision.

Appendix

library(tidyverse)
library(jsonlite)

json_data <- fromJSON("pixabay_data.json")
pixabay_photo_data <- json_data$hits
names(pixabay_photo_data)
quantiles <- pixabay_photo_data %>%
  pull(imageSize) %>%
  quantile()
#select(previewURL, pageURL, selected_photos, tags)
selected_photos <- pixabay_photo_data %>%
  mutate(sizeLevel = ifelse(imageSize <= quantiles[1], "small", 
                            ifelse(imageSize > quantiles[1] & imageSize <= quantiles[2], "mid", "large")),
         pageOrientation = ifelse(previewWidth >= previewHeight, "landscape", "portrait"),
         likesProportion = round((likes/views)*100,3)) %>%
  select(previewURL, pageURL, tags, likesProportion, pageOrientation, sizeLevel, views)%>%
  filter(likesProportion > quantile(likesProportion, 0.75))  # Adjust this condition as per your requirement
write_csv(selected_photos, "selected_photos.csv")
  
meanLikesProportion <- selected_photos$likesProportion %>% mean(na.rm=TRUE)
meanViews <- selected_photos$views %>% mean(na.rm=TRUE)
boat_photos <- sum(str_detect(selected_photos$tags, "boat"))

total_photos <- nrow(selected_photos)


selected_summaries <- selected_photos%>%
  group_by(pageOrientation)%>%
  summarise(n())

animation <- selected_photos %>%
  pull(previewURL)%>%
  image_read() %>%
  image_animate(fps = 5)
animation

image_write(animation, "my_photos.gif")
#seperate rows with "," to get a single tag from tags 
tags_new <- selected_photos%>%
  separate_rows(tags, sep = ", ")
tags_new
# count number of each tags
tags_count <- tags_new%>%
  group_by(tags)%>%
  summarise(n())
#rename "n()" to "freq" 
tags_count<- tags_count %>%
  rename(freq = 2)
# Filter the top 5 tags from tags_count
top_tags_counts <- tags_count %>%
  filter(freq %in% head(sort(freq, decreasing = TRUE), 5))

  
# Plot the bar plot
ggplot(top_tags_counts, aes(x = tags, y = freq)) +
  geom_bar(stat = "identity", fill = "skyblue") +
  labs(x = "Tags", y = "Frequency", title = "Top 5 Most Common Tags")